Skip to content

fix: handle pandas 3.0 default StringDtype#1777

Open
filippsatverily wants to merge 1 commit into
cdisc-org:mainfrom
filippsatverily:filipps/pandas3-handle-stringdtype
Open

fix: handle pandas 3.0 default StringDtype#1777
filippsatverily wants to merge 1 commit into
cdisc-org:mainfrom
filippsatverily:filipps/pandas3-handle-stringdtype

Conversation

@filippsatverily

@filippsatverily filippsatverily commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Pandas 3.0 changes the default string dtype from object to StringDtype, which requires these changes:

  1. Regex operators: .map() now returns nullable BooleanDtype, where pd.NA & True raises instead of returning False. Adds a _map_regex() helper that normalizes to numpy bool via .fillna(False).astype(bool), used by all prefix/suffix/matches regex operators.

  2. Case-insensitive comparisons: .lower() on a non-string value (e.g. pd.NA) raises AttributeError. Guards with isinstance(target_val, str) before calling .lower().

  3. Empty-column detection in record_count: checks dtype == "object" to identify string columns, which misses StringDtype. Uses pd.api.types.is_string_dtype() instead.

  4. Date validation: simplifies the is_valid_date guard to not isinstance(date_string, str), which already handles None, pd.NA, and any other non-string type.

Tested scenarios:

  • Full pytest suite: 1746 passed, 11 skipped, 0 failed (pandas 2.3.3, dask 2025.12.0)
  • Ran validation on CDISC_Pilot_Study_v4_FIXED.json: 201 SUCCESS, 6 SKIPPED, 0 errors

@filippsatverily filippsatverily marked this pull request as ready for review June 22, 2026 21:40
@filippsatverily

Copy link
Copy Markdown
Contributor Author

@SFJohnson24 another commit from #1745

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant